Parallel Algorithms for Solving Markov Decision Process
نویسندگان
چکیده
Markov decision process (MDP) provides the foundations for a number of problems, such as artificial intelligence studying, automated planning and reinforcement learning. MDP can be solved efficiently in theory. However, for large scenarios, more investigations are needed to reveal practical algorithms. Algorithms for solving MDP have a natural concurrency. In this paper, we present parallel algorithms based on dynamic programming. Meanwhile, the cost of computation and communication complexity of this method is analyzed. Moreover, experimental results demonstrate excellent speedups and scalability.
منابع مشابه
A Parallel Algorithm for POMDP Solution
Most exact algorithms for solving partially observable Markov decision processes (POMDPs) are based on a form of dynamic programming in which a piecewise-linear and convex representation of the value function is updated at every iteration to more accurately approximate the true value function. However, the process is computationally expensive, thus limiting the practical application of POMDPs i...
متن کاملCold standby redundancy optimization for nonrepairable series-parallel systems: Erlang time to failure distribution
In modeling a cold standby redundancy allocation problem (RAP) with imperfect switching mechanism, deriving a closed form version of a system reliability is too difficult. A convenient lower bound on system reliability is proposed and this approximation is widely used as a part of objective function for a system reliability maximization problem in the literature. Considering this lower bound do...
متن کاملProducing efficient error-bounded solutions for transition independent decentralized mdps
There has been substantial progress on algorithms for single-agent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading ...
متن کاملA Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes
The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows linearly with the state space size, which makes them frequently intractable. This paper presents a Modified Policy Iteration algorithm to compute an optimal policy for large Markov decision processes in the discounted reward criteria and under infinite horizon. The idea of this algorithm is based o...
متن کاملLids - P - 2172 Asynchronous Stochastic Approximation and Q - Learning 1
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
متن کامل